IEEE Transactions on Medical Imaging
● Institute of Electrical and Electronics Engineers (IEEE)
Preprints posted in the last 90 days, ranked by how well they match IEEE Transactions on Medical Imaging's content profile, based on 18 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Li, S.; Gao, J.; Kim, C.; Choi, S.; Chen, Q.; Wang, Y.; Wu, S.; Zhang, Y.; Huang, T.; Zhou, Y.; Yao, B.; Yao, Y.; Li, C.
Show abstract
Three-dimensional (3D) handheld photoacoustic tomography typically relies on bulky and expensive external positioning trackers to correct motion artifacts, which severely limits its clinical flexibility and accessibility. To address this challenge, we present PA-SfM, a tracker-free framework that leverages exclusively single-modality photoacoustic data for both sensor pose recovery and high-fidelity 3D reconstruction via differentiable acoustic radiation modeling. Unlike traditional Structure-from-Motion (SfM) methods that formulate pose estimation as a geometry-driven optimization over visual features, PA-SfM integrates the acoustic wave equation into a differentiable programming pipeline. By leveraging a high-performance, GPU-accelerated acoustic radiation kernel, the framework simultaneously optimizes the 3D photoacoustic source distribution and the sensor array pose via gradient descent. To ensure robust convergence in freehand scenarios, we introduce a coarse-to-fine optimization strategy that incorporates geometric consistency checks and rigid-body constraints to eliminate motion outliers. We validated the proposed method through both numerical simulations and in-vivo rat experiments. The results demonstrate that PA-SfM achieves sub-millimeter positioning accuracy and restores high-resolution 3D vascular structures comparable to ground-truth benchmarks, offering a low-cost, softwaredefined solution for clinical freehand photoacoustic imaging. The source code is publicly available at https://github.com/JaegerCQ/PA-SfM.
Chen, C.; Huiru, W.; Peilin, G.; Xi, C.; Ren, J.
Show abstract
Clearing Assisted Scattering Tomography (CAST) extends coherent scattering tomography to whole-brain imaging, enabling visualization of fine-scale brain-wide connectivity. As a coherent optical tomography modality, CAST is inherently affected by speckle noise, which degrades image quality and limits quantitative analysis. However, existing speckle reduction methods developed for optical coherence tomography (OCT) are not directly transferable to CAST images due to differences in sample and noise statistics. Here, we present a learning-based cleared-sample speckle reduction network, termed CLEAR Net, specifically designed for CAST imaging, which effectively suppresses speckle noise in whole-brain white matter images while preserving fine structural details. We quantitatively benchmarked CLEAR Net against representative speckle reduction algorithms on CAST datasets and further evaluated its generalizability using publicly available ophthalmic datasets.
Chato, L.; Sereda, T.
Show abstract
In this paper, we present a novel and efficient framework for cross-modality medical image synthesis, developed for BraSyn-Task 8. Our method combines the fast-sampling capabilities of the Fast-Denoising Diffusion Probabilistic Model (Fast-DDPM) with Discrete Wavelet-Transformed components, as used in Conditional Wavelet Diffusion Models. By reducing the number of denoising steps to 100 and using wavelet-transformed inputs, we accelerate both training and inference and reduce memory usage while preserving high image quality. The framework was trained on the BraTS 2025 dataset, which includes four magnetic resonance imaging (MRI) modalities: T1-weighted, contrast-enhanced T1-weighted (T1c), T2-weighted, and FLAIR. We developed four independent models, each synthesizing one missing modality from the remaining three. Evaluation on the BraSyn 2025 Task 8 public validation set demonstrated competitive performance using standard image metrics: mean squared error, signal-to-noise ratio, and structural similarity index. Our method achieved Third place in the challenge in the final test data, with fast inference times (average 41- 67 seconds per case). To assess clinical relevance, we applied a pretrained nnU-Net segmentation model on the synthesized modalities. Segmentation results yielded high Dice coefficients: 0.877 for the whole tumor, 0.769 for the tumor core, and 0.667 for the enhancing tumor. These results confirm the effectiveness and reliability of our approach for missing-modality synthesis, enabling accurate downstream analysis in high-dimensional medical imaging tasks. Our team in the challenge is USD-2025-Chato-Sereda (Team ID: 3551654).Github link: https://github.com/tsereda/brats-synthesis
Maidu, B.; Gonzalo, A.; Guerrero-Hurtado, M.; Bargellini, C.; Martinez-Legazpi, P.; Bermejo, J.; Contijoch, F.; Flores, O.; Garcia-Villalba, M.; McVeigh, E.; Kahn, A.; del Alamo, J. C.
Show abstract
Atrial fibrillation (AF) promotes blood stasis and thrombus formation, most often within the left atrial appendage (LAA), and can lead to stroke or transient ischemic attack (TIA). Time-resolved contrast-enhanced computed tomography (4D CT) captures left atrial (LA) opacification and washout, but it does not directly provide quantitative stasis metrics such as blood residence time. Patient-specific computational fluid dynamics (CFD) can quantify LA/LAA residence time, yet routine clinical use is limited by computational cost and sensitivity to patient-specific boundary conditions. Here, we present two complementary approaches to infer time-resolved 3D residence time fields directly from contrast dynamics. First, a physics-informed neural network (PINN) treats contrast as a passive scalar and jointly reconstructs velocity and residence time by enforcing the incompressible Navier-Stokes equations and transport equations for contrast concentration and residence time in moving, patient-specific LA anatomies. Second, an indicator dilution theory (IDT) formulation computes voxelwise, time-resolved residence time maps from contrast time curves alone by constructing a PV-referenced impulse response and modeling transport with a tank-in-series model with spatially dependent parameters. Both methods are benchmarked against patient-specific CFD in six cases spanning diverse LA function, including three patients with TIA or thrombus in the LAA and three patients free of events. Both approaches reproduce expected spatial and temporal trends, with higher residence time in the distal LAA and higher LAA residence time in cases with TIA or thrombus. IDT demonstrates the closest agreement with CFD across the full range of residence times and produces maps in seconds, facilitating clinical translation. In contrast, the PINN additionally recovers phase-dependent atrial flow structures, but tends to smooth and underestimate the highest residence-time regions and requires hours of training. Together, these results support a scalable workflow in which IDT enables rapid stasis screening from contrast CT, and PINNs provide a complementary pathway for detailed, patient-specific hemodynamic inference when full-field flow information is needed.
Shaul, O.; Ilovitsh, T.
Show abstract
Beam shaping of ultra-short pulses is essential for medical ultrasound, where single-cycle excitations are required to achieve high axial resolution and improve frame rate. Conventional methods, such as the Gerchberg-Saxton (GS) algorithm or more recent deep learning approaches, are generally effective for continuous-wave excitation but degrade significantly under single-cycle conditions. In diagnostic imaging, high frame rate is critical for applications demanding rapid scanning. In this context, multi-line transmission (MLT) leverages beam shaping to synthesize multiple simultaneous foci, thereby increasing frame rate. In parallel, structured illumination methods for super-resolution and acoustical holography likewise depend on actively shaping single-cycle pulses to produce controlled patterns, highlighting the need for precise short-pulse beam shaping. To address this challenge, we introduce the spatio-temporal adaptive reconstruction (STAR) algorithm, which performs active beam shaping directly in the time domain by integrating the generalized angular spectrum method (GASM) into an iterative optimization scheme. STAR enforces constraints on both the transducer and focal planes, enabling accurate control of single-cycle excitations. Simulations showed that STAR consistently outperformed GS for multi-focus patterns. For example, in a four-foci configuration, STAR achieved a correlation of 0.80 compared to 0.64 for GS, with significantly improved uniformity across focal peaks. Resolution analysis demonstrated that STAR reduced the minimum distinguishable foci spacing from 1.09 mm with GS to 0.87 mm. Experimental hydrophone measurements confirmed these improvements. Across multi-foci patterns, STAR produced more distinct and balanced foci compared to those observed with GS. These results demonstrate that STAR provides robust and efficient active beam shaping of single-cycle pulses, maintaining accuracy across different depths and frequencies for diagnostic applications.
Dhawan, R.; Agarwal, M.; Jain, S.; Shekhar, H.
Show abstract
ObjectiveSuper-resolution ultrasound (SR-US) reveals microvascular structures with exquisite resolution, but clinical translation remains limited by the need for ultrafast frame rates, massive data volumes, and long reconstruction times. This work proposes a deep learning framework that reconstructs microvascular maps from low-frame-rate enhanced ultrasound sequences, bypassing explicit microbubble localization and tracking. MethodsA transformer-decoder network with learned linear projections was designed to model spatiotemporal dependencies across sparse contrast-enhanced ultrasound sequences and reconstruct vessel probability maps, refined via a post-processing enhancement stage. Single-head self-attention captures temporal correlations under challenging conditions including overlapping microbubbles and low signal-to-noise ratios. Binary cross-entropy loss guided training to preserve vascular topology across synthetic and in vivo datasets. In vivo rat brain bolus data from the PALA challenge was used to evaluate this approach under up to 500 - fold data reduction (341 frames at 2 FPS vs. 170400 frames at 1000 FPS in standard ULM). ResultsDespite aggressive undersampling, the proposed pipeline recovered coherent microvascular architecture where conventional ULM pipelines applied to the same sparse data failed to produce continuous vascular networks. Major branches and higher-order microvessels remained visible with apparent vessel widths broadened by approximately three-fold relative to reference SR-US. End-to-end reconstruction completed in 28-133 seconds on an NVIDIA H100 GPU depending on the number of frames employed. ConclusionThe reported approach preserved vascular topology with fast reconstruction and low data overhead, albeit at lower resolution. The substantial reduction in frames and computation time highlights the translational potential of this SR-US-inspired microvascular imaging approach.
Lee, S.; Shivaei, S.; Shapiro, M. G.
Show abstract
Ultrasound is emerging as a method for molecular and cellular imaging by connecting the versatile physics of sound waves to protein-based contrast agents such as gas vesicles (GVs). BURST is a common imaging mode that leverages the strong, transient echoes generated when GVs collapse under acoustic pressure to enable highly sensitive ultrasound visualization of cells and biomolecules, down to the single cell level. However, BURST is vulnerable to fluctuating background signals, with large-amplitude fluctuations in scattering, as often present in vivo, obscuring genuine GV responses. In this study, we mathematically examine this limitation and show that incorporating statistical metrics such as correlation or temporal contrast-to-noise ratio effectively suppresses unwanted non-GV voxels and quantifies detection confidence, including in image sequences in which GV collapse spans multiple frames. Compared with prior methods, our approach enhances the clarity of BURST images and provides probabilistic interpretations of GV signals, facilitating more reliable analysis of ambiguous in vivo molecular imaging, as we demonstrate in imaging tumor-homing probiotics and gene expression in the brain.
Lu, H.; Ashbrook, J.; Dunn, A. K.
Show abstract
Multi-exposure speckle imaging (MESI) estimates flow-related parameters by fitting a physics-based speckle contrast model to measurements acquired over multiple exposure times. In standard pipelines, parameters are recovered via nonlinear least-squares fitting at each pixel, which is computationally expensive and can yield spatially inconsistent maps when uncertainty in the estimated speckle contrast variance [Formula] (from camera noise and finite spatial/temporal sampling used to compute speckle contrast) is amplified by independent pixel wise inversion. This work reframes MESI parameter estimation as identification of a globally shared inverse operator of the analytical forward model, exploiting the fact that a single physical mapping governs all pixels while noise drives large variance in independent pixel wise inversion. Rather than solving millions of iterative optimizations, a single parameterized inverse mapping is learned directly from a single acquired MESI dataset. Physics consistency is enforced by embedding the fixed MESI forward model as an analysis-by-synthesis layer that re-synthesizes speckle contrast curves from the predicted parameters. Training is self-supervised: the inverse mapping is optimized by minimizing a reconstruction loss between measured and re-synthesized speckle contrast curves, which constrains estimates to the set of physically admissible MESI curves without requiring ground truth parameter labels. Experiments on a numerical MESI phantom with known ground truth and on in vivo mouse cortex data show that the proposed method produces more stable inverse correlation time (ICT) maps (1/{tau}c) and improved spatial coherence relative to conventional per-pixel fitting, while substantially reducing inference time by replacing iterative optimization with a single feed-forward evaluation.
Ma, S.; Xu, M.; Dao, M.; Li, H.
Show abstract
Microscopy-based analysis of red blood cell (RBC) morphology is widely used to study phenotypes in sickle cell disease (SCD). Although AI models have been developed to automate classification, most are trained on pre-cropped single-cell images and thus struggle with full-scope microscopic images containing densely packed cells and diverse morphologies, which require both accurate detection and fine-grained classification. We propose an end-to-end computational framework to identify individual RBCs in full-scope microscopy images and classify them into five morphological categories: discocytes (DO), echinocytes (E), elongated and sickle-shaped cells (ES), granular cells (G), and reticulocytes (R). We first evaluate advanced detection-classification models, including You Only Look Once (YOLO) and Detection Transformers (DETR), and demonstrate that while these models effectively detect cells, their classification performance falls short of specialized classifiers trained on single-cell images, particularly for minority phenotypes. To address this limitation, we introduce a two-step framework in which a YOLO-based detector localizes and crops individual cells from full-scope images, followed by a fine-tuned DenseNet121 ensemble classifier that assigns each cell to one of the five morphological categories. The proposed framework achieves a detection-level F1-score of 0.9661 and a weighted-average classification F1-score of 0.9708, with an overall classification accuracy of 97.06%. Compared with the single-step YOLO26n baseline, the two-step pipeline yields a macro-average F1-score improvement of +0.1675, with particularly substantial gains for minority classes (E: +0.1623; G: +0.2774; R: +0.2603). Overall, this hybrid framework demonstrates a practical strategy for adapting fast, general-purpose detection models to domain-specific biomedical tasks by combining them with specialized classifiers, delivering both efficiency and high accuracy for scientific and clinical image analysis.
Pan, Y.; Yuan, X.; Liu, H.; Yang, Y.; Kang, G.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWMagnetic resonance imaging (MRI) is a cornerstone of modern neuroimaging, where accurate segmentation of brain structures and lesions is essential for diagnosis, treatment planning, and clinical research. However, most current foundation models are trained on mixed-organ datasets, while the anatomical structures of the brain differ substantially from those of other organs such as the lungs and kidneys. As a result, these models often struggle to adapt to the distinctive characteristics of brain tissue. In this work, we present Brain-SAM, a model tailored for brain MRI segmentation. Brain-SAM extends the Segment Anything Model 2 (SAM2) framework by enabling the Hiera encoder to directly process 3D volumetric data and introducing a UNETR-inspired decoder for hierarchical feature decoding. The model preserves the interactive segmentation paradigm of SAM while also supporting fully automatic segmentation. Trained on multiple brain MRI datasets covering brain tumors, stroke, and epilepsy, Brain-SAM demonstrated superior performance to state-of-the-art methods. Compared with nnU-Net, it achieved Dice scores improvements of 22%, 9%, and 6% on epileptic lesions, brain metastases, and meningiomas, respectively. Notably, Brain-SAM showed clear advantages in small-lesion segmentation, achieving 15%-18% higher Dice compared with other strong baseline models. We believe that Brain-SAM may offer a useful pre-trained model for downstream brain MRI analysis tasks, and could contribute to future research and clinical applications.Our code and models are available at https://github.com/DLbrainsam/Brain-SAM.
Xie, C.; Wang, Y.; Li, D.; Yu, B.; Peng, S.; Wu, L.; Yang, M.
Show abstract
Handheld ultrasound devices have revolutionized point-of-care diagnostics, but their effectiveness remains limited by operator dependency and the need for specialized training. This paper presents an intelligent guidance and diagnostic assistance system for the handheld wireless ultrasound device, enabling automated carotid artery and thyroid examinations through handheld operation. Drawing inspiration from the Actor-Critic framework, we implement a simulation-based reinforcement learning approach for real-time probe navigation toward standard anatomical views. The system integrates YOLOv8n-based detection networks for carotid plaque and thyroid nodule identification, achieving real-time inference at 30 frames per second. Furthermore, we propose a hybrid measurement approach combining UNet segmentation with the Snake algorithm for precise biometric quantification, including carotid intima-media thickness (IMT), lumen diameter, and lesion dimensions. Experimental validation on clinical datasets demonstrates that the proposed system achieves 91.2% accuracy in standard plane acquisition, 87.5% mean average precision (mAP) for plaque detection, and 89.3% mAP for nodule identification. Measurement results show excellent agreement with expert sonographers, with IMT measurements exhibiting a mean absolute difference of 0.08 mm. These findings demonstrate the feasibility of intelligent handheld ultrasound examination, significantly reducing operator dependency while maintaining diagnostic accuracy comparable to experienced clinicians.
Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.
Show abstract
Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.
Dadgar-Kiani, E.; Hebbale, V.; Attalla, G.; Alvarez, J. L.; Dunsford, S.; Caulfield, K. A.; Good, C. H.; Krystal, A. D.; Sugrue, L. P.; Fan, J. M.; Fouragnan, E.; Pichardo, S.; Butts Pauly, K.; Murphy, K. R.
Show abstract
Focused ultrasound can be delivered through the temporal window to modulate heterogeneously located brain areas. Acoustic simulations allow for safety assessments when dynamically targeting brain structures, but the mismatch between simulation and measured focal pressure can vary across the steerable range due to mechanically inaccurate assumptions made about the skull and transducer. Here, we describe efficient methods for simulation-measurement calibration using axisymmetric projections and sparse sampling across a 3D steerable subspace encompassing deep brain targets across 157 subjects. To address the simulation-reality mismatch in skull transmission, we used the measured and predicted pressure values through eight human temporal window fragments to derive an optimized bone attenuation coefficient. Collectively, the calibration framework and optimized temporal window coefficients can be used broadly across studies to improve the accuracy of reporting and dependent safety assessment for personalized neuromodulation treatments.
Assaf, O.; Guvenis, A.
Show abstract
Computed Tomography (CT) is one of the largest contributors to radiation exposure from medical imaging, which can induce DNA damage and increase cancer risk. Reducing CT radiation dose to improve patient safety inherently increases image noise and artifacts. Generative adversarial networks (GANs) have shown promise for unsupervised low-dose CT (LDCT) denoising. Building on this, RDBCycleGAN-CBAM, a CycleGAN-based model that integrates residual dense blocks (RDBs) and convolutional block attention modules (CBAM), was developed to effectively denoise quarter-dose CT images while preserving structural detail. The model was trained on unpaired quarter-dose and full-dose CT scans from the NIH-AAPM-Mayo dataset using adversarial (LSGAN), cycle-consistency, and identity losses. Evaluation on held-out test slices was performed using PSNR and SSIM as the primary image-quality metrics. The results demonstrate that the proposed RDBCycleGAN-CBAM method not only achieves higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values but also outperforms most existing deep learning-based methods, achieving mean improvements of +3.97 dB in PSNR and +0.053 in SSIM relative to quarter-dose inputs. Shapiro- Wilk tests for PSNR and SSIM motivated the use of the nonparametric Wilcoxon signed-rank test, by which highly significant improvements across both metrics (PSNR and SSIM) were demonstrated. The very large rank-biserial correlation values (1.0) indicate that nearly all test images experienced substantial quality improvement. Furthermore, the narrow bootstrap confidence intervals for the mean differences suggest that these improvements are consistent across the dataset. These advancements contribute to medical imaging by providing a viable, vendor-neutral tool for reducing patient radiation exposure without compromising diagnostic value.
Jia, Y.; Niu, J.; Qie, Z.; Li, Z.; Laine, A. F.; Guo, J.
Show abstract
Accurate classification of brain tumors from MRI is critical for guiding clinical decision-making; however, existing deep learning models are often hindered by limited interpretability and pronounced sensitivity to hyperparameter selection, which constrain their reliability in medical settings. To address these challenges, we propose TumorCLIP, a lightweight and training-efficient vision-language framework that integrates radiology-informed text prototypes with a DenseNet-based visual encoder to support clinically meaningful semantic reasoning, fused via a Tip-Adapter mechanism. TumorCLIP does not aim to introduce a new vision-language model architecture. Instead, its contribution lies in the integration of radiology-informed text proto-types tailored to MRI interpretation, a systematic evaluation of backbone stability across diverse visual architectures, and a lightweight, training-efficient CLIP-based fusion framework designed for medical imaging applications. We first conduct a comprehensive unimodal benchmark across eight representative visual backbones (EfficientNet-B0, MobileNetV3-Large, ResNet50, DenseNet121, ViT, DeiT, Swin Transformer, and MambaOut) using a standardized optimizer and learning-rate grid search, revealing performance swings exceeding 60 percentage points depending on hyperparameter choices. DenseNet121 shows the strongest stability-accuracy trade-off within our evaluated optimizer and learning-rate grid (97.6%). Leveraging this foundation, TumorCLIP fuses image features with frozen CLIP-derived text prototypes, achieving concept-level explainability, robust few-shot adaptation, and enhanced classification of minority tumor classes. On the test set, TumorCLIP attains 98.5% accuracy, including a +1.86 percentage point recall increase for Neurocytoma, suggesting that radiology-informed textual priors can improve semantic alignment and help refine diagnostic decision boundaries within the evaluated setting. Additional evaluation on an independent external dataset shows that TumorCLIP achieves improved cross-dataset performance under the evaluated distribution shift, relative to the unimodal DenseNet121 baseline. These results demonstrate TumorCLIP as a practical, interpretable, and data-efficient alternative to conventional visual classifiers, providing evidence for radiology-aware vision-language alignment in MRI-based brain tumor classification. All results are reported within the evaluated datasets and training protocols.
Shukla, A.; Rao, A.; Siddharth, S.; Bao, R.
Show abstract
Chest radiography (CXR) is a primary modality for assessing cardiopulmonary conditions, but its effectiveness is limited by anatomical obstructions (e.g., ribs, clavicles) that hinder accurate pneumothorax segmentation, boundary delineation, and severity estimation. While deep learning-based bone suppression improves soft-tissue visibility, its utility for precise pixel-wise localization remains underexplored. This study investigates the downstream application of bone suppression for pneumothorax segmentation, integrating it as a preprocessing step to mitigate bony obscuration. We evaluate its impact across CNN and Vision Transformer models on two public datasets, where models trained on bone-suppressed CXRs significantly outperform (p < 0.05) non-suppressed counterparts, achieving up to 17% improvement in Mean Average Surface Distance (MASD), 4.9% in Dice Similarity Coefficient (DSC), and 5.9% in Normalized Surface Dice (NSD), alongside a 9.5% gain in Matthews Correlation Coefficient (MCC). These results demonstrate bone suppression as an architecture-independent enhancement for pneumothorax localization, improving the reliability of automated CXR interpretation.
Qu, B.; Liu, W.; Zhou, L.; Guo, X.; Malin, B.; Yin, Z.
Show abstract
Dense breast tissue diminishes the sensitivity of mammographic screening and is a key cancer risk factor, which motivates accurate segmentation under scarce and expensive expert annotations in the medical imaging domain. Here, we benchmark the effect of backbone architecture, self-supervised pre-training (SSL), fine-tuning strategy, and loss design for dense-tissue segmentation on a small expert-labeled dataset (596 images) and an in-domain unlabeled corpus (20, 000 images), reflecting the lack of large public pixel-level density datasets. CNNs (EfficientNet, Xception, nnUNet) clearly outperform transformer and Medical-SAM2 models, and full or layer-wise fine-tuning reliably exceeds parameter-efficient updates. Generic image-only SSL (MIM, SimCLR, Barlow Twins) often yields negligible or negative gains over ImageNet initialization, whereas a simple multi-view contrastive SSL and a hybrid segmentation-density loss provide the best accuracy and calibration (e.g., MAE from 14.8% to 11.8%, Spearman with the four BI-RADS breast density categories from 0.42 to 0.51 on VinDr). We also quantify GPU hours for different SSL and fine-tuning choices, showing that only a small set of protocols, such as EfficientNet with multi-view SSL, hybrid loss, and full fine-tuning, offers favorable accuracy-efficiency trade-offs. These findings provide practical defaults for annotation-limited mammography studies and support compute-conscious deployment of automatic breast density assessment in web-based screening workflows.
Mauri, C.; Mckenzie, A.; Analoro, C.; Yeon, E.; Coviello, R.; Mora, J.; Chollet, E.; Deden Binder, L.; Mahar, A.; Lin, S.; Benlahcen, M.; Ream, A.; Jama, A.; Garcia, I.; Tran, N.; Onta, P.; Wood, S.; Willis, A.; Mahmood, A.; Sinoballa, G.; Malki, A.; Tran, K.; Malireddy, V.; Onumajuru, N.; Lakshmanan, S.; Hercules Landaverde, K.; Sidow, R.; Wood, D.; Nguyen, B.; Hernandez, J.; Bernier, M.; Hunter, J.; Malki, A.; Tum, A.; Chavez, V.; Shahu, Z.; Vasi, I.; Visser, A.; Ghaouta, Z.; Bond, F.; Vigneshwaran, R.; Kirkpatrick, E.; Avalos Barbosa, M.; Rauh, K.; Herisse, R.; Garcia Pallares, E.; Zeng, X.
Show abstract
The cerebral vasculature is central to brain function, with alterations linked to numerous cerebrovascular and neurological disorders. Yet, no single imaging modality can capture the entire cerebral vascular network in humans. Instead, an array of techniques are sensitized to different spatial scales, while trading off resolution for coverage. Magnetic Resonance Imaging (MRI) typically resolves only large pial vessels, while high-resolution microscopy allows micrometer-scale vessels to be mapped over limited spatial extents. These techniques must therefore be combined to obtain a complete mapping of the cerebral angioarchitecture, which underscores the need for automatic, cross-modal vessel segmentation. Here, we introduce VesSynth, a flexible vessel segmentation framework that achieves state-of-the-art accuracy across multiple modalities and spatial resolutions (MR, optical and X-ray imaging), despite being trained entirely on synthetic data. By enabling consistent vascular mapping across scales, this framework paves the way to comprehensive investigation of cerebrovascular organization and its role in health and disease.
Ruth, P. S.; DeBenedetti, T.; O'Brien, L.; Landay, J. A.; Coleman, T.; Fox, E. B.
Show abstract
Vascular waveforms, which measure bulk flow in blood vessels, are widely used to measure vital signs, diagnose conditions, and predict long-term health outcomes. Analyzing vascular waveforms depends on three fundamentally interdependent tasks: signal filtering, pulse timing detection, and pulse shape extraction. We hypothesized that Bayesian pulse deconvolution can achieve improved performance on all three tasks by solving them jointly. This method uses an analytical, generative model of vascular waveforms with priors informed by physical and biological domain knowledge. In simulations, Bayesian pulse deconvolution achieves better performance on all tasks compared with existing algorithms: 90% reduction of median filtering error, 60% reduction in pulse timing error, and 85% reduction in shape extraction error. The advantages in simulations extend to human recordings of photoplethysmography waveforms. Taking real time-synchronized electrocardiogram R-R intervals as a proxy ground truth, Bayesian pulse deconvolution achieves 40% lower pulse interval estimation error (RMSE = 5.1 ms) compared with typical algorithms (RMSE = 8.3 ms, p=1e-10). By extracting more accurate and informative insights from vascular waveforms, Bayesian pulse deconvolution could advance a wide array of health technologies that rely on interpreting signals from blood vessels.
Djebbara, I.; Yin, Z.; Friismose, A. I.; Poulsen, F. R.; Hojo, E.; Aunan-Diop, J. S.
Show abstract
Mechanical properties of biological tissues vary across spatial scales, yet radiomics typically relies on fixed, heuristic choices for neighbourhood size, kernel geometry, and spectral content - choices that can silently reshape the feature space before any modelling begins. We introduce a label-free, information-theoretic framework for selecting extraction parameters in multi-frequency MRE radiomics. For each configuration {theta} - neighbourhood radius r, kernel geometry k (sphere or shell), and frequency subset f - we extract a radiomics feature matrix and score it using an objective J({theta}) that integrates distributional richness (Shannon entropy), cross-frequency coherence (canonical correlation), inter-feature redundancy (Spearman correlation), and bootstrap stability. We evaluate 121 configurations per tissue in multi-frequency MRE (30-60 Hz) of human brain, liver, and a calibrated phantom, and test robustness using 10,000 Dirichlet-sampled objective weightings. Across tissues, neighbourhood aggregation is consistently preferred over voxel-wise extraction, outperforming the no-neighbourhood baseline in 98.4-100% of weightings. External validation in 100 independent brain scans acquired with a different protocol and wider frequency range (20-90 Hz) confirms a reproducible mesoscopic plateau at r = 3-5 (9-15 mm), with a modal optimum at r = 4; omitting neighbourhood analysis reduces J({theta}) by 38% relative to each subject's optimum. Frequency-subset preferences replicate across datasets, with lower frequencies most frequently selected for brain. By turning ad hoc extraction choices into an outcome-free optimisation step, this framework improves reproducibility, reduces sensitivity to heuristic parameter choices, and generalises across acquisition protocols and imaging sites.